The palmerpenguins data contains size measurements for
three penguin species observed on three islands in the Palmer
Archipelago, Antarctica.
These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. The data were imported directly from the Environmental Data Initiative (EDI) Data Portal, and are available for use by CC0 license (“No Rights Reserved”) in accordance with the Palmer Station Data Policy.
You can install the released version of palmerpenguins from CRAN with:
install.packages("palmerpenguins")
Or install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("allisonhorst/palmerpenguins")
This package contains two datasets:
Here, we’ll focus on a curated subset of the raw data in the
package named penguins.
The raw data, accessed from the Environmental Data Initiative (see
full data citations below), is also available as
palmerpenguins::penguins_raw.
The curated palmerpenguins::penguins dataset contains 8
variables (n = 344 penguins). You can read more about the variables by
typing ?penguins.
glimpse(penguins)
#> Rows: 344
#> Columns: 8
#> $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
#> $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
#> $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
#> $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
#> $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
#> $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
#> $ sex <fct> male, female, female, NA, female, male, female, male…
#> $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
The palmerpenguins::penguins data contains 333 complete
cases, with 19 missing values.
The penguins data has three factor variables:
penguins %>%
dplyr::select(where(is.factor)) %>%
glimpse()
#> Rows: 344
#> Columns: 3
#> $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie…
#> $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgers…
#> $ sex <fct> male, female, female, NA, female, male, female, male, NA, NA, …
# Count penguins for each species / island
penguins %>%
count(species, island, .drop = FALSE)
#> # A tibble: 9 × 3
#> species island n
#> <fct> <fct> <int>
#> 1 Adelie Biscoe 44
#> 2 Adelie Dream 56
#> 3 Adelie Torgersen 52
#> 4 Chinstrap Biscoe 0
#> 5 Chinstrap Dream 68
#> 6 Chinstrap Torgersen 0
#> 7 Gentoo Biscoe 124
#> 8 Gentoo Dream 0
#> 9 Gentoo Torgersen 0
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar(alpha = 0.8) +
scale_fill_manual(values = c("darkorange","purple","cyan4"),
guide = FALSE) +
theme_minimal() +
facet_wrap(~species, ncol = 1) +
coord_flip()
# Count penguins for each species / sex
penguins %>%
count(species, sex, .drop = FALSE)
#> # A tibble: 8 × 3
#> species sex n
#> <fct> <fct> <int>
#> 1 Adelie female 73
#> 2 Adelie male 73
#> 3 Adelie <NA> 6
#> 4 Chinstrap female 34
#> 5 Chinstrap male 34
#> 6 Gentoo female 58
#> 7 Gentoo male 61
#> 8 Gentoo <NA> 5
ggplot(penguins, aes(x = sex, fill = species)) +
geom_bar(alpha = 0.8) +
scale_fill_manual(values = c("darkorange","purple","cyan4"),
guide = FALSE) +
theme_minimal() +
facet_wrap(~species, ncol = 1) +
coord_flip()
# Penguins are fun to summarize!
penguins %>%
count(species)
#> # A tibble: 3 × 2
#> species n
#> <fct> <int>
#> 1 Adelie 152
#> 2 Chinstrap 68
#> 3 Gentoo 124
penguins %>%
group_by(species) %>%
summarize(across(where(is.numeric), mean, na.rm = TRUE))
#> # A tibble: 3 × 6
#> species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Adelie 38.8 18.3 190. 3701. 2008.
#> 2 Chinstrap 48.8 18.4 196. 3733. 2008.
#> 3 Gentoo 47.5 15.0 217. 5076. 2008.
penguins %>%
dplyr::select(body_mass_g, ends_with("_mm")) %>%
glimpse()
#> Rows: 344
#> Columns: 4
#> $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
#> $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
#> $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
#> $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
# Scatterplot example 1: penguin flipper length versus body mass
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species,
shape = species),
size = 2) +
scale_color_manual(values = c("darkorange","darkorchid","cyan4"))
# Scatterplot example 2: penguin bill length versus bill depth
ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species,
shape = species),
size = 2) +
scale_color_manual(values = c("darkorange","darkorchid","cyan4"))
You can add color and/or shape aesthetics in ggplot2 to
layer in factor levels like we did above. With three factor variables to
work with, you can add another factor layer with facets, like the plot
below.
ggplot(penguins, aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(color = sex)) +
scale_color_manual(values = c("darkorange","cyan4"),
na.translate = FALSE) +
facet_wrap(~species)
The culmen is the upper ridge of a bird’s bill. In the simplified
penguins data, culmen length and depth are renamed as
variables bill_length_mm and bill_depth_mm to
be more intuitive.
For this penguin data, the culmen (bill) length and depth are measured as shown below (thanks Kristen Gorman for clarifying!):
# Jitter plot example: bill length by species
ggplot(data = penguins, aes(x = species, y = bill_length_mm)) +
geom_jitter(aes(color = species),
width = 0.1,
alpha = 0.7,
show.legend = FALSE) +
scale_color_manual(values = c("darkorange","darkorchid","cyan4"))
# Histogram example: flipper length by species
ggplot(data = penguins, aes(x = flipper_length_mm)) +
geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") +
scale_fill_manual(values = c("darkorange","darkorchid","cyan4"))
Data originally published in:
Individual datasets:
Individual data can be accessed directly via the Environmental Data Initiative:
Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. https://doi.org/10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f (Accessed 2020-06-08).
Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. https://doi.org/10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689 (Accessed 2020-06-08).
Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e (Accessed 2020-06-08).